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Abstract 

Abstract. We give topological and algebraic characterizations as well as language theo- 
retic descriptions of the following subclasses of first-order logic F0[<] for ^-languages: E2, 
FO 2 , FO 2 n E 2 , and A 2 (and by duality il 2 and FO 2 n n 2 ). These descriptions extend the 
respective results for finite words. In particular, we relate the above fragments to language 
classes of certain (unambiguous) polynomials. An immediate consequence is the decidabil- 
ity of the membership problem of these classes, but this was shown before by Wilke [31] 
and Bojanczyk [2] and is therefore not our main focus. The paper is about the interplay of 
algebraic, topological, and language theoretic properties. 

1 Introduction 

The algebraic approach is fundamental for the understanding of regular languages. It has been 
particularly fruitful for fragments of first-order logic over finite words. For example, a result of 
Wilke and Therien is that FO 2 and A2 have the same expressive power [25], where the latter class 
by definition denotes £2 H II2 . Further results are language theoretic and (very often decidable) 
algebraic characterizations of logical fragments, see e.g. [24] or [8] for surveys. Several results for 
finite words have been extended to other structures such as trees and other graphs, see [29] for a 
survey. More recently, FO 2 , A2, and £2 have been characterized for Mazurkiewicz traces [9, 14]; 
A2 and the Boolean closure of £1 have been characterized for unranked trees [3, 4]. For some 
characterizations over finite words, it has been shown that they cannot be generalized; e.g. over 
unranked trees, it turned out that FO 2 and A2 are incomparable [1]. For infinite words, the 
expressive power of FO 2 is not equal to A2, since saying that letters a and b appear infinitely 
often, but c only finitely many times is F0 2 -definable, but there is neither a £2-formula nor a 
HVformula specifying this language. 

The results about finite words do not translate directly to infinite words as neither £2 nor II2 
copes with the exact alphabetic information which letters appear infinitely often, see Figures 1(a) 
and 1(b). 



S 2 / A 2 = FO 2 \ n 2 



(a) Finite words (b) Finite and infinite words 

Figure 1: The fragments £2, II2, and FO 2 over finite and over finite and infinite words 

Our results deepen the understanding of first-order fragments over infinite words. A decidable 
characterization of the membership problem for FO 2 over infinite words has been given in the 
habilitation thesis of Wilke [31]. Recently, decidability for £2 has been shown independently by 
Bojanczyk [2]. Language theoretic and decidable algebraic characterizations of the fragment £1 
and of its Boolean closure can be found in [16, 18]. 

We introduce two generalizations of the usual Cantor topology for infinite words. One of our 
first results is a characterization of £2-definability for languages in T°°. This characterization 
consists of two components: The first one is an algebraic property of the syntactic monoid and the 
second part is requiring that L is open in some alphabetic topology. Both properties are decidable. 

Our second result is that a regular language is F0 2 -definable if and only if its syntactic monoid 
is in the variety DA. (The result is surprising in the sense that it contradicts a statement in [31]). 
In addition, we show that a language is definable in FO 2 if and only if it is closed in some further 
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refined alphabetic topology and if it is weakly recognizable by a monoid in DA. In particular, 
weak recognition and strong recognition do not coincide for the variety DA. This seems to be a 
new result as well. We also contribute a language theoretic characterization of FO 2 in terms of 
unambiguous polynomials with additional constraints on the letters which occur infinitely often. 

Other results of our paper are the characterization of FO 2 Pi £2 as the class of unambiguous 
polynomials and of A2 in terms of unambiguous polynomials in some special form and also in terms 
of deterministic languages. It follows already from this description that A2 is a proper subset of 
FO 2 . Furthermore, we show that the equality of FO 2 and A2 holds relativized to some fixed set of 
letters which occur infinitely often. If this set of letters is empty, we obtain the situation for finite 
words as a special case. Finally, we relate topological constructions such as interior and closure 
with membership in the fragments under consideration. A brief summary of the results for the 
various fragments can be found in Section 7 at the end of this paper. 

For basic notions on languages of infinite words we refer to standard references such as [16, 27]. 
Most results of the present paper are from its conference version [10], but for lack of space they 
appeared in many cases without proof. The present journal version gives full proofs and some 
new material. In particular, we give a new characterization of w-regular A2-languages involving 
deterministic and complement-deterministic languages, cf. Corollary 6.9. 

2 Preliminaries 

Words Throughout, T is a finite alphabet, A C T is a subset of the alphabet, u, v, w are finite 
words, and a, /3, 7 are finite or infinite words. If not specified otherwise, then in all examples 
we assume that T has three different letters a, 6, c. By u < a we mean that u is a prefix of 
a. By alph(a) we denote the alphabet of a, i.e., the letters occurring in the sequence a. As 
usual, r* is the free monoid of finite words over T. The neutral element is the empty word 1. 
If L is a subset of a monoid, then L* is the submonoid generated by L. For L C T* we let 
L u = {U1U2 ■ ■ ■ I Ui e L for all i > 1} be the set of infinite products. We also let L°° = L* U L u . 
A natural convention is Y° = 1. Thus, L°° = U° if and only if 1 £ L. 

We write im(a) for those letters in alph(a) which have infinitely many occurrences in a. The 
notation has been introduced in the framework of so called complex traces, see e.g. [12] for a 
detailed discussion of this concept. The notation im(a) refers to the imaginary part and we adopt 
it here, but for our purpose it might be also convenient to remember im(a) as an abbreviation for 
letters which appear infinitely many times in a. Sets of the form A lm play a crucial role in our 
paper. By definition, A lm is the set of words a such that im(a) = A. Note that T* = lm . The 
set r°° is the disjoint union over all A lm . 

Logic and regular sets We assume that the reader is familiar with basic concepts in formal 
language theory. Our focus is on regular languages. If L C r°° is regular, then we may think that 
its finitary part L n T* is specified by some NFA and that its infinitary part L(ir u is specified by 
some Biichi automaton. For a unified model to accept regular languages in T°° it is convenient to 
consider an extended Biichi automaton which has a finite set of states Q and two types of accepting 
states, a set of final states F C Q for accepting finite words and a set of repeated states R C Q for 
accepting infinite words. Thus, this model yields also a natural definition of deterministic regular 
languages in T°°, see below for more details. 

We focus on regular languages which are given by first-order sentences in FO[<]. Thus, atomic 
predicates are X(x) = a and x < y saying that position x in a word a is labeled with a G T 
and position x is smaller than y, respectively. By FO 2 we mean FO[<]-scntcnces which use at 
most two names x and y as variables or the class of languages specified by such formulas. It is 
well-known that three variables are sufficient to express any FO[<]-property (see e.g. [7]), whereas 
FO 2 is a proper subclass. Similarly, £2 means FO[<] -sentences which are in prenex normal form 
and which start with a block of existential quantifiers, followed by a block of universal quantifiers 
and a Boolean combination of atomic formulas. A Il2-formula means a negation of a E2-formula. 
The notations £2 and II2 refer also to the corresponding language classes. The class A2 means the 
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class of X^-formulas which have an equivalent l^-formula. But the notion of equivalence depends 
on the set of models we use. 

If the models are finite words, then a result of Therien and Wilke [25] states FO 2 = A2. 
Moreover, FO 2 is the class of regular languages in T* which are recognized by some finite monoid 
in the variety DA and a classical result of Schiitzenberger shows that DA also coincides with 
unambiguous polynomials [21]. The variety DA has been baptized this way because it means 
T>-classes are aperiodic. More precisely, DA contains those finite monoids, where all regular D- 
classes are aperiodic semigroups. We refer to [23, 8] for more background on the class DA. It is 
also the class of finite monoids defined e.g. by equations of type (xy) w = (xy) UJ y(xy) UJ . Another 
characterization says that DA is defined by finite monoids M satisfying e = ese for all idempotents 
e (i.e., e 2 = e) and for all s = Si • • • s n where e G MsiM for each i, see e.g. [5, 19, 28]. This is the 
definition which we use below. 

Saying that formulas are equivalent if they agree on all finite and infinite words refines the 
notion of equivalence for formulas and changes the picture. This is actually the starting point 
of this work. So, in this paper models are finite and infinite words. We are mainly interested 
in infinite words, but it does no harm to include finite words, and this makes the situation more 
uniform and the results on finite words reappear as special cases. See e.g. Theorem 5.11 which 
impies that FO 2 = A2 for finite words by choosing A = 0. 

Recognizability by finite monoids By M we denote a finite monoid. We always assume that 
M is equipped with a partial order < being compatible with the multiplication, i.e., u < v implies 
sut < svt for all s,t,u,v G M. If not specified otherwise, we may choose < to be the identity 
relation. 

For an idempotent element e G M we define M e — {s G M \ e G MsM}* , i.e., M e is the 
submonoid of M which is generated by factors of e. If M has a generating set T, then M e 
is generated by {a G T | e G MaM}. We can think of this set as the maximal alphabet of the 
idempotent e. We say that an idempotent e is locally top (locally bottom, resp.) if ese < e 
(ese > e, resp.) for all s G M e . By DA we denote the class of finite monoids such that ese = e for 
all idempotents e G M and all s G M e . Thus, it is the class of finite monoids where idempotents 
are locally top and locally bottom. 

Remark 2.1 Assume that M is generated T. In order to test that M G DA, it is enough to check 
for all e = e 2 G M and all a G T with e G MaM that we have eae — e. Indeed, consider s G M e 
and a G r with e G MaM . By induction ese — e, and it is enough to see that esae = e. Now, 
ese = e implies that the element es is idempotent and we have es G MaM , too. The result follows: 

esae = esaese = esese = e. 

Example 2.2 Let M = {1, a, b, c, ba, 0} be the monoid having the following description: All 
elements are idempotent except for ba. We have (ba) 2 — ab — 0, and behaves like a zero, i.e., 
Ox = xO = for all x. Moreover, we have the equations: 

ca = a, ac = c, cb = c, be = b, ab = 0. 

The monoid M is not in DA, because a 2 = a = cba G MbM, but aba = / «. However, the 
submonoid N = M \ {c} is in DA. Visual representations of M and N in terms of so-called 
egg-box diagrams (see e.g. [17]) can be found in Figures 2(a) and 2(b). O 

Let L C r°° be a language. The syntactic preorder <l over T* is defined as follows. We let 
u <l v if for all x,y,z G T* we have both implications: 

xvyz u G L => xuyz^ G L and x(vyY G L => x(uy)^ G L. 

Let us recall that Y = 1. Two words u, v G T* are syntactically equivalent, written as u =l v, if 
both u <l v and v <l u. This is a congruence and the congruence classes [u]i = {v G T* | u =l v} 
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form the syntactic monoid Synt(L) of L. The preorder <x, on words induces a partial order <l 
on congruence classes, and (Synt(L), <l) becomes an ordered monoid. It is a well-known classical 
result that the syntactic monoid of a regular language L C r°° is finite, see e.g. [16, 27]. Moreover, 
in this case L can be written as a finite union of languages of type \v\l where u, v G T* with 
uv =l u and v =l v. In contrast to finite words, there exist non-regular languages in r°° with a 
finite syntactic monoid. 

Now, let h : T* — > M be any surjective homomorphism onto a finite ordered monoid M and 
let L C r°°. If the reference to h is clear, then we denote by [s] the set of finite words h~ 1 (s) for 
s € M. We use the following terminology. 

• (s, e) G M x M is a linked pair, if se — s and e 2 = e. 

• /i weakly recognizes L, if 

L = |J {[s][e] w | (s, e) is a linked pair and [s][e] u CL} 

• /i strongly recognizes L (or simply recognizes L), if 

£ = IJ {WN W I 0, e) is a linked pair and [s][e] w n £ ^ 0} 

• L is downward closed (on finite prefixes) for /i, if [s][e]" C L implies [i][e] w C L for all 
s,t,e € M where t < s. 

If L is regular, then the syntactic homomorphism strongly recognizes L. 

Example 2.3 Let T = {a,b,c} and L be one of the languages T*abT*, r*a6r w , or r*a6r°°. 
The syntactic monoid of L is always the same. It has six elements and can be identified with the 
monoid M = {1, a, b, c, ba, 0} defined in Example 2.2 such that the syntactic homomorphism maps 
r to the respective generators of M. Actually we have: 

T*abT* = [0] = [0][l] w , 

r*a6T w = |J{[0][e] w I 1/eGM}, 

r*a6r°° = |J{[0][e] w | e e M}. 

All of the above languages are strongly recognized by M (using the syntactic homomorphism) . The 
language [0][a] w is weakly recognized by M, but it is not strongly recognized because abfcbca) 1 ^ = 
abc{bcacY G [0][a] u D [0][b] u and ab u G [0][6] w \ [0][a] w . O 

Lemma 2.4 Let L C r°° be a regular language and let '■ T* — * Synt(L) be its syntactic 
homomorphism. Then for all s,t,e,f G M such that t < s, f < e, and [s][e] w Q L we have 
[t][f]" £ L. In particular, L is downward closed (on finite prefixes) for h^. 

Proof: Let u G [s], x G [e] and let v G [t], y G [/]. Now, ux" G L implies vx^ G L, which in turn 
implies vy u G L. Since L is regular, /i^ strongly recognizes L; and we obtain [£][/]" C L, because 

vy w G 1/fni. ' " J □ 



Deterministic, complement-deterministic, and arrow languages Intuitively, the best way 
to define deterministic languages is to say that a language is deterministic, if it is recognized by 
a deterministic extended Buchi automaton with final and repeated states as described above. 
Therefore, a regular language L C r°° is deterministic if and only if its w-regular part L n T u can 
be accepted by some deterministic Buchi automaton in the usual sense. 

There is also a well-known tight connection to what we call here arrow languages W: For 
W C T* we define 

W = {a G r°° | for every prefix u < a there exists uv < a with uv G W} . 
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Using Biichi automata, we see that a regular language L C r°° is deterministic if and only 
if we can write LflT" = W n for some regular W C T*. Actually, a classical result of 
Landweber yields a more precise statement: If L C is w-regular and L = W Pi for some set 
W C r*, then W can be chosen to be regular, too (which means L is deterministic) see e.g. [27]. 
Therefore it is justified to take the weakest condition as a formal definition here. Moreover, as we 
have not formally defined Biichi automata, we use the Landweber characterization as our working 
definition: If we speak about a deterministic language then we are content with L being regular 
and inP = W (~1 T u for some set W C T* . It is called complement-deterministic, if T°° \ L is 
deterministic. It is well-known and easy to see (e.g. with our working definition) that deterministic 
languages are closed under finite union and finite intersection. 

For example, if W = T*a, then VFnT" is the deterministic w-regular language of words having 
infinitely many a's. Its complement is not deterministic (if |T| > 2). Hence infinitely many a's 
is not complement-deterministic. In particular, deterministic languages do not form a Boolean 
algebra, whereas the class of languages which are simultaneously deterministic and complement- 
deterministic does. Note that the class of arrow languages is not closed under finite intersection: 
r*anr*6 is deterministic but not an arrow language (in our sense) because the intersection is not 
empty, e.g., it contains (ab) u , but it does not contain any finite word. 

Our definitions differ slightly from the notation used elsewhere, where W is commonly used as 
the w-language of those infinite words with infinitely many prefixes in W , which is the set W PI Y w 
in our notation. In our definition we have however a closure operator: W C W = W U (W PI r"). 
Moreover, the characterization of A2-languages is more natural in our definition. Also note that 
if L = W, then W = L n Y*. If we only have L n T w = W fl V w , then there are uncountably many 
choices for W, in general. 

Finite w-semigroups The notion of an w-semigroup has been introduced as a tool for language 
varieties of finite and infinite words; and it leads, in particular, to an Eilenberg-type theorem, 
see [16, 30]. Finite w-semigroups yield another possible framework to express most of our results. 
Our focus is however to transfer results from finite words to infinite words using topology, so the 
classical theory of recognition by finite monoids turned out to be suitable for our purposes. But 
still it might be useful for a possible generalization to convert our results to the terminology of 
(^-semigroups. We refer to the textbook [16], where the theory has been nicely presented in detail. 

3 The alphabetic topology and polynomials 

Topological information is crucial in our characterization results. Recall that a topology on a set X 
is given by a family of subsets (called open subsets) such that a finite intersection and an arbitrary 
union of open subsets is open. We define the alphabetic topology on the set T°° by its basis, which 
is given by all sets of the form uA°° for u £ T* and ACT. Thus, a set L C r°° is open if and only 
if for each ACT there is a set of finite words Wa C T* such that L — \J Wa A°° . By definition, a 
set is closed, if its complement is open; and it is clopen, if it is both open and closed. For example, 
the sets uA°° are clopen. In particular, the sets A°° are clopen, too. A set of the form A lm is not 
open unless A = 0, it is not closed unless A = T. 

Note that in the alphabetic topology every singleton u € T* is open since u0°° = u {1} = {u}. 
Thus, r* is an open, discrete, and dense subset of T°°. The alphabetic topology is a refinement 
of the usual Cantor topology, where the languages {u} and ur°° form a basis of (Cantor-)open 
subsets for u G T. The Cantor space T°° is compact. As soon as T has at least two letters more 
sets are open in the alphabetic topology than in the Cantor topology. For example, the sets uA°° 
being clopen in the alphabetic topology are neither open nor closed in the Cantor topology for 

Remark 3.1 The space T°° with the alphabetic topology is Hausdorff. It is compact if and only 
if |r| < 1. To see that it is not compact for T = {a, b} note that T°° is covered by a°° together 
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with open sets of the form ubT°° with u G V* . But for no finite subset F C T* do we have 
poo = a oo u p hV oo 

For a language L, its closure L is the intersection of all closed sets containing L. A word 
q e r belongs to L if for all open subsets U C T°° with a G U we have U C\ L ^ 0. The 
interior of L is the union of all open sets contained in L. It can be constructed as the complement 
of the closure of its complement. For languages L and K we define the right quotient as a language 
of finite words by L/K = {u G T* \ ua G L for some a G K}. In particular, we have 

L/A°° = {u G T* I ua G L for some a G A 00 } . 

The following proposition gives a description of the closure in the alphabetic topology in terms 
of arrow languages W plus some alphabetic restrictions. 

Proposition 3.2 In the alphabetic topology we have A lm = [J ACB B lm and 

L = (J (l/a™ n A im ) = |J (l/a™ n l 1 ^) . 

Acr act 

Proof: It is elementary to show A 1 ™ = \J ACB B im . We first show L C IJ^cr (^M 00 n A im ). Let 
a £ L with a G A lm . For all prefixes u of a we find v such that a G uvA°° . We have u-uA 00 f]L ^ 0; 
and thus uv G L/A 00 . This shows a G L/A°°. 

The inclusion U AC r H A im ) C {J AGr (L/A°° n A™) is trivial. 

Let now a G L/A°° n B im with iCB. Since L/A°° C i/B 00 , we have a G L/B°° n B im . Let 

> 

w G T* with a = uf3 and /3 G We have to show uB°° n 1/ ^ 0. Since a G L/B°° there is some 
u G T* with < a and itv G L/B°°. This means UW7 G L for some 7 G B°°. Since /3 G we 
have v G S*. Hence 117 G B 00 and thus uv^j G (~l L ^ as desired. □ 

The following corollary generalizes a well-known fact for the Cantor topology to the (finer) 
alphabetic topology. This result will be used in Section 6. 

Corollary 3.3 Let L C L°° be a regular language. Then its closure in the alphabetic topology L 
is deterministic. 

Proof: Deterministic languages are closed under finite union and finite intersection. For a letter 
a the language {a} lm is deterministic as it is the language of words having infinitely many a's. 
Hence A lm — HaeA { a } lm is deterministic, too. The result follows. □ 

Corollary 3.4 Given a regular language L C L°°, we can decide whether L is closed (open resp., 
clopen resp.). 

Proof: We may assume that L is specified by some NFA for LOT* and by some Biichi automaton 

for LnP. The construction of an NFA recognizing L/A°° is standard. Since L/A°° C T* we can 

assume that the NFA is deterministic, and we can view it as a (deterministic) Biichi automaton 

> . 

recognizing L/A°° n T w . Intersection with A lm yields a Biichi automaton for L n A lm and A ^ 0. 

Thus, we can test L n A lm C L for all A. This implies that we can test L = L. The result for 

open and clopen follows since regular languages are effectively closed under complementation. □ 

Actually, we have a more precise statement than pure decidability. In the following, PSPACE 
denotes as usual the class of problems which can be decided by some polynomially space bounded 
(deterministic) Turing machine. 

Theorem 3.5 The following problem is PSPACH-complete: 
Input: A Biichi automaton A with L(A) C F w . 
Question: Is the regular language L(A) closed? 
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Proof: We can check in PSPACE whether a regular language L C T u is closed: Let L = L(A) 
for some non-deterministic Biichi automaton A. We verify L — L using the characterization of L 
given in Proposition 3.2. We can check in PSPACE whether two Biichi automata are equivalent, 
see [22]. In particular, we can check in PSPACE whether L n A [m = L/A°° n A im for all ACT. 

It is PSPACE-hard to decide whether a regular language L C r w is closed: We use a re- 
duction of the problem whether L(A) = T* for some NFA A, see [15]. We can assume that 
1 G L(A). Let c ^ T be a new letter. We can construct a non-deterministic Biichi automa- 
ton B such that L(B) = {wicw 2 c--- e (ru{c}) u | 3i: w 2 G £(-4)}- The closure of L(B) is 
A' = {wicu^c-- • G (ru{c}) u I Vi: w; G T*} = (r*c) w . Hence, L{A) = T* if and only if L(B) = K 
if and only if L(B) is closed. □ 

According to Proposition 3.2 the alphabetic closure is a union over languages of type L/A°° or 

L/A 00 CiA im . But these pieces do not themselves need to be closed, as we can see in the following 
example. 

Example 3.6 Let A = {a}, B = {a,b}, and L = a*(a6)*6a". Then L/A°° = a*(ab)*ba* and 
L/B°° is the set of all finite prefixes of words in L. We have L/A°° = a*(ab)*ba°° and L/A°° n 

A 1 ™ = a*(ab)*ba u = L. The language L/A°° is open but neither L/A 00 nor L/A 00 n#" is 

► 

closed in the alphabetic topology, because (ab)" belongs to both closures. We have L/B°° = 

a*{ab)*ba co iJa*{ab) ul and L/B°°nB im = a*(ab) u . Both sets are closed. Actually, L = LUa*(ab) u 
in the alphabetic topology. 

The alphabetic closure L is not closed in the Cantor topology since o" L, but every Cantor- 
open neighborhood of a" contains a word a n (ab) u for some n£N. O 

Frequently we apply the closure operator to polynomials. A polynomial is a finite union of 
monomials. A monomial (of degree k) is a language of the form A\a\ ■ ■ ■ A^,akA^ +1 with ai G F 
and Ai C T. In particular, A\a\ ■ ■ ■ A* k ak is a monomial with A^.+i = 0. The set A* is a polynomial 
since A* = 0°° U UaeA A*a. It is not hard to see that polynomials are closed under intersection. 
Thus, A\a\ ■ ■ ■ A* k akA* k+1 = A*a% ■ ■ ■ A^a fe A^ hl fl T* is in our language a polynomial, but not a 
monomial unless Ak+i = 0. 

A monomial P = A\a\ ■ ■ ■ A^akAj?^ is called unambiguous, if for every a G P there exists a 
unique factorization a = u\a\ ■ ■ -UkakP such that Ui G A* and (3 G A k * } +1 . A polynomial is called 
unambiguous, if it is a finite union of unambiguous monomials. 

Example 3.7 For T = {a, b} the language r*a&r°° can be written as an unambiguous monomial, 
because: 

r*a6r°° = b*aa*b {a,6}°°. 

Similarly, r*afcT* can be written as an unambiguous polynomial. However, for T = {a, 6, c} the 
situation is different. Neither r*a6r* nor r*abr°° is unambiguous. Their syntactic monoid is the 
monoid M = {1, a, 6, c, 6a, 0} defined in Example 2.2, which is not in DA as shown there. So the 
claim follows by Theorem 5.5. O 

It follows from the definition of the alphabetic topology that polynomials are open. Actually, 
it is the coarsest topology with this property. The crucial observation is that we have a syntactic 
description of the closure of a polynomial as a finite union of other polynomials. For later use we 
make a more precise statement by considering the closure with respect to different subsets B at 
infinity. 

Lemma 3.8 Let P = A\a\ ■ ■ ■ A^akA"^^ be a monomial and L = P D B lm for some B C Ak+i ■ 
Then the closure of L in the alphabetic topology is given by 

L= |J A* 1 a 1 ---A*^ 1 a^ 1 A°°nA im . 

{a,i,...,a k }UBCACAt 
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Proof: First consider an index i with 1 < i < k + 1 such that {dj, ...,ai}Ufi C A C Ai. Let 
a € A\a\ ■ ■ ■ A*_ l ai-\Af > fl A lm . We have to show that a is in the closure of L. Let a — uf3 
with u G AJai • ••A?_ 1 o i _i^ and /? £ A°° n A im . We show that n L ^ 0. Choose some 

7 £ B°° fl £? lm . As B C holds by hypothesis, we see that uai ■ ■ ■ £ P, and hence 

uai ■ ■ ■ 0^7 £ uA 00 fl i. 

Let now a £ L and write a £ uui • • • Vk+iA°° D A lm with alph(uj) = A. There exists 7 £ A°° 
such that uv\ ■ ■ ■ Vfc+17 £ -P H -B lm . This implies B C A. Since Wi • • ■ Wfc+i7 £ A*oi ■ • ■ A^A^ 
there are some 1 < i, j < k + 1 such that uv\ ■ ■ ■ Vj-% belongs to A\a\ ■ ■ ■ Af^aj-iA* , Vj £ A*, 
and v j+ i ■ ■ ■ Ufc+i7 £ A*ai ■ ■ ■ A* k akA^ +l n A°°. Therefore {a i} . .., a k } C A C Aj, too. It follows 
that a £ A\ ai ■ ■ ■ A^a^Af n A im . □ 

Example 3.9 Let F = {a, b, c} and L — r*a6F*. Its closure is given by 

L = r*abr°°U {a, b} im U {a, b, c} im = T*ab F°° U r im . 

O 

As usual, let L C F°° be a regular language. Let us define t/" <l se w for linked pairs (s,e), 
(i, /) by the implication: 

NN U CL [t][/]"CL. 
With this notation we can give an algebraic characterization of being open. 

Lemma 3.10 A regular language L C r°° is open in the alphabetic topology if and only if for all 
linked pairs (s, e), (t, f) of M = Synt(Z) with t, f £ M e we have stf 10 <l se u . 

Proof: Let L be open and a £ [s][e] w C L. We find a finite prefix it £ [s] of a such that 
a £ C L. Since <, / £ M e we may assume alph(uw) C A for some v £ [t] and w £ [/]. Hence, 

uvw" £ [st][/] w C L. This shows stf u < L se u . 

For the converse, suppose that for all linked pairs (s, e), (t, f) of M — Synt(L) with t, / £ M e 
we have st/" < L se". Let a £ [s][e] w C L. Write a = u[3 with u £ [s] and /3 £ [e] w n n A im . 
Now, any 7 £ A°° can be written as 7 £ [£][/]" for some linked pair with t,f £ M e . Indeed, 
we have A* C [M e ] : consider a G A and let p, q £ A* such that pat? £ [e] . Then a £ [M e ] and 
therefore A C [M e ]. Since M e is a submonoid, [M e ] is a submonoid of T* and hence A* C [M e ]. 
By assumption U7 £ [si][/] w C L. It follows wA°° C L, i.e., 1/ is open. □ 



4 The fragment E 2 

By a (slight extension of a) result of Thomas [26] on cj-languages we know that a language L C r°° 
is definable in E2 if and only if L is a polynomial. However, this statement alone does not yield 
decidability. It turns out that we obtain decidability by a combination of an algebraic and a 
topological criterion. (This decidability result has also been shown independently by Bojahczyk [2] 
using different techniques.) We know that polynomials are open. Therefore, we concentrate on 
algebra. 

Lemma 4.1 If L C r°° is a polynomial, then all idempotents o/Synt(L) are locally top. 

Proof: By hi, we denote the syntactic homomorphism F* — * Synt(L). Let n £ N such that L is a 
finite union of monomials of degree less than n. Let hi,(e) be idempotent; in particular e" =l e. 
For e =l f we may assume that alph(/) C alph(e). This means we take the maximal possible 
alphabet for e. Let s £ alph(e)*. We want to show that xeseyz u £ L if xeyz 1 ^ £ i. 

Suppose u = xe n yz u> £ AJai ■ • ■ AlakA^ +1 C L and k < n. Since there are at most n — 1 letters 
ai, some factor e of u lies completely within one of the A* or within A^ +1 , i.e., alph(e) C Aj for 
some 1 < i < k + 1. Hence, ese £ A* and xe ni se n2 yz u £ AJai • ■ • A* k akA<j? +1 C L for some 
"i,"2 > L Since ft»i(e) is idempotent, it follows that xeyz^ £ L implies xeseyz u £ L. Similarly, 
x(ey) UJ £ L implies x(esey) UJ £ L and therefore ese <l e for all s £ alph(e)*, i.e., /ij,(e) is locally 
top. □ 
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Theorem 4.2 Let L C r°° be a regular language. The following five assertions are equivalent: 

1. L is E2- definable. 

2. L is a polynomial. 

3. L is open in the alphabetic topology and all idempotents o/Synt(L) are locally top. 
4- The syntactic monoid M — Synt(L) and the syntactic order <l satisfy: 

(a) For all linked pairs (s, e), (t, /) with t, / € M e we have stf 1 ^ <l se u . 

(b) e = e 2 and s G M e implies ese <l e. 

5. The following three conditions hold for some homomorphism h : T* — > M which weakly 
recognizes L: 

(a) L is open in the alphabetic topology. 

(b) All idempotents of M are locally top. 

(c) L is downward closed (on finite prefixes) for h. 

Proof: "1 <4> 2": This is a slight modification of a result by Thomas [26]. 

"2 =>• 3": By definition, polynomials are open in the alphabetic topology. In Lemma 4.1 it has 
been shown that all idempotent elements are locally top. 

"3 <^> 4": The equivalence of L being open and "4a" is Lemma 3.10. Property "4b" is the 
definition of all elements being locally top. 

"4 => 5" : Let h = Hl be the syntactic homomorphism onto the syntactic monoid M = Synt(L). 
Since L is regular, the homomorphism h strongly recognizes L. Applying Lemma 3.10, the property 
"5a" follows from "4a" and "5b" trivially follows from "4b". The condition "5c" holds for Synt(L) 
by Lemma 2.4. 

"5 => 2" : Consider a £ L with im(a) = A. By "5a" the language L is open. Hence, there exists 
a prefix u of a such that a G uA°° C L. From the case of finite words and the hypothesis "5b" on 
M, we know that P = {v G T* | h(v) < h(u)} is a polynomial. We can assume that all monomials 
in P end with a letter. We define the polynomial P a = PA°° . Clearly, L C (J {P a a 6 L} and 
this union is finite since M is finite. It remains to show that P a C L for a G L. Let v G P and 
(3 G A°° . We know u(3 G L and there exists a linked pair (s, e) such that u(3 G [s][e] w C L. Now, 
there exists wy = (3 such that uw G [s] and 7 G [e]". By definition of P, we have h(v) < h(u) and 
therefore t — h{vw) < h(uw) = s. It follows vf3 — vw-f G [t][e]" C L by "5c". This shows P„CL 
and thus L = [j {P a \ a G L}. □ 

Corollary 4.3 If zs decidable whether a regular language is ^-definable. 

Proof: The syntactic congruence is computable and the conditions in "3" (or "4") of Theorem 4.2 
are decidable. □ 

Remark 4.4 An uj-language LCP is E2- definable, if L = {a G F w | a \= ip} for some if G £2. 
TTiis is equivalent with L L) T* being T,2-definable as a subset of T°° . Thus, the decidability of 
Corollary J f .3 transfers to to-regular languages. 

Of course, complementation yields dual results for the fragment II2. In particular, Il2-definable 
languages are closed in the alphabetic topology. 

5 Two variable first-order logic 

Etessami, Vardi, and Wilke have given a characterization of FO 2 in terms of unary temporal 
logic [11]. In the same paper, they considered the satisfiability problem for FO 2 . We continue the 
study of FO 2 over infinite words. It will turn our that the fragments FO 2 and E2 are incomparable. 
Therefore, it makes sense to also consider FO 2 (~l E2 and FO 2 n II2. 
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5.1 The fragment FO 2 and the strict alphabetic topology 

This section yields the algebraic characterization of FO 2 in terms of the variety DA. The following 
lemma can be proved essentially in the same way as for finite words. The result is also (implicitly) 
stated in the habilitation thesis of Wilke [31]. 

Lemma 5.1 Let L C r°° be FO 2 -definable. Then the syntactic monoid Synt(L) is in DA. 

Proof: Let L — L(ip) for some F0 2 -formula of quantifier depth n. Let e 2 = e 6 M = Synt(L) 
and let s G M e . We can choose words v,w G V* such that hj,(v) = s, hi (to) = e, and, moreover, 
alph(u) C alph(w). Now, consider words of the form a = xw n vw n yz u ' , a' = xw n yz u} and [3 — 
x(w n vw n y) u> , f3' = x(w n y) UJ . It is easy to show that the second player has a winning strategy 
in the n-round Ehrenfeucht-Frai'sse game for FO 2 on (a, a') and also on {(3,(3'). A description 
of the game can be found in [13] and the winning strategy is a modification of the proof in the 
finitary case [25]. The game equivalence implies that both words in each pair satisfy the same 
FO 2 sentences of quantifier depth no more than n. In particular, a € L if and only if a' G L. 
Analogously, (3 G L if and only if f3' G L. Thus, Synt(L) G DA. □ 

A set like ^4 lm is F0 2 -definable, but it is neither open nor closed in the alphabetic topology, 
in general. Therefore, we need a refinement of the alphabetic topology. As a basis for the strict 
alphabetic topology we take all sets of the form uA°° flA™. Thus, more sets are open (and closed) 
than in the alphabetic topology. Another way to define the strict alphabetic topology is to say 
that it is the coarsest topology on T°° where all sets of the form A\a\ ■ ■ ■ A^,akA^ +1 n B lm are 
open. The strict alphabetic topology is not used outside this section, but it is essential here in 
order to prove the converse of Lemma 5.1. 

Lemma 5.2 If L C r°° is strongly recognized by some homomorphism h : T* — ► M G DA, then 
L is clopen in the strict alphabetic topology. 

Proof: Since h strongly recognizes T°° \ L as well, it is enough to show that L is open. Let a G L 
with a G [s][e]" for some linked pair (s,e) and let A = im(a). We show that [s]^4°° n A lm C L. 
Indeed, let (3 G [s]A°° n A im . Then we have (3 = uvj with h(u) = s, h(v) = r, 7 G [/]" 
where v G A*, alph(7) = 1111(7) = A an d i r , f ) is a linked pair. Since M G DA, we obtain 
s = se = serfe = srfe and e/e = e and fef = f. We have [sr][fef] u n [sr/e][e/e]" 7^ and 
[sr/e][e/e] w = [s][e]" C L. Since h strongly recognizes L, we have [sr][/] w = [sr][fef] u C L, too. 
In particular, (3 G L. □ 

Lemma 5.3 If L is closed in the strict alphabetic topology and if L is weakly recognized by some 
homomorphism h : T* — > M G DA, then L is a finite union of languages A\ai ■ ■ ■ A* k akA^ +1 n 
A™ j , where each A\a\ ■ ■ ■ A* k akA^ +1 is an unambiguous monomial. 

Proof: Let a G L. Write a = u(3 with f3 G A°° n A im for some ACT. There is a linked 
pair (s,e) with a G [s][e] w C L and we may assume h(u) — s and (3 G [e] w . For A = we 
have [s] C L and, using our knowledge about the finite case, we may include [s] in our finite 
union of unambiguous polynomials. Therefore, let A ^ 0. We may choose an unambiguous 
monomial P — A*a\ ■ ■ ■ A k ak Q [s] such that u G P and each last position of every letter a G 
{ai, . . . , afe} U A\ U • • • U Ak occurs explicitly as some aj in the expression P. Note that [s] is a 
finite union of such monomials. Moreover, we may assume that uv G P for infinitely many prefixes 
v < (3. Each such uv can uniquely be written as uv — v\a\ ■ ■ 'Ufeflfc with Vi G A*. This yields a 
vector in N fe by (|«iai| , \v\a\V2a2 \ , • ■ • , \ v\a\ ■ ■ ■ vtak\) for every uv G P. By Dickson's Lemma [6], 
every infinite sequence in N fc contains an infinite subsequence which is non-decreasing in every 
component. Therefore, we may assume that the sequence of vectors induced by the prefixes uv 
is in no component decreasing when uv gets longer. In addition (after removing finitely many 
uu's) we may assume there is some i > such that the component \v\a\ ■ ■ ■ Vidi\ is constant 
and |«iOi • • • ViOiVi + iai+i is strictly increasing. It follows that we may assume {a^+i, . . . , ak} Q 
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alph(uj + i) = A E A i+ i. In particular, a E A\ai ■ ■ ■ A*aiA°° n A lm . It is clear that this expression 
is unambiguous. 

It remains to show A\a x ■ ■ ■ A*a,iA°° A [m C L. Consider u'7 with v! E A\a x ■ ■ ■ A*a l and 7 G 
A°° n A lm . Since L is closed, it is enough to show that u'j belongs to the closure of L in the strict 
alphabetic topology. Choose any prefix w < 7. It is enough to show that u'wA°° nA lm P\L ^ 0. Let 
z E T* with alph(z) = A and = e. Since w E A* E A* +1 , we have u'wai+i ■ ■ ■ cik G P C [s]. 
Hence u'wa i+ i •• -afcz" G [s][e] w G L. □ 

The next statement follows again as in the case of finite words. 

Lemma 5.4 Every language A lra and every unambiguous monomial A\a\ ■ ■ ■ A^akA^^ is FO 2 - 
definable. 

Proof: The language of non-empty words in A lra is defined by the F0 2 -sentence 
f\ (Vx3y: x < y A \{y) = a) A f\ (3xVy: x < y A X(y) ^ 0) . 

We use induction on k in order to show that P = A\a\ ■ ■ ■ A* k akA^ +1 is F0 2 -definable. Clearly, 
for k = this is true. Let now k > 1. By unambiguity, we cannot have {ai, . . . , a^} C A\ n 
since for (ai • • • a^) 2 there would exist two different factorizations. First, suppose dj g' ^4fe+i- Let 
a = aiajQ!2 G P where a, alph(«2)- There are two possibilities: the last a, of a could be one of 
the Oj's, i < j < k, and then 

ai G A*ai • ■ • Aj, ai^aj, a 2 E A* +1 a j+1 ■ ■ ■ A* k a k A^ +1 

or it matches some A* , i < j < k + 1 and then 

a\ E A\a\ ■ ■ - A*, ai E Aj, aiE A*aj ■ ■ ■ A%akA^ +1 . 

In any case, the remaining four polynomials are unambiguous and their degree is strictly smaller 
than k. Hence, by induction we have F0 2 -formulas describing them. Obviously, we can also 
express intersections with languages of the form B* or B°° for BET. So there is a finite list 
of F0 2 -formulas such that for each a E P there are formulas <p and ip from the list and a letter 
a E T with a E L(ip)aL(tp) C P and L(ip) C (r \ {a})°°. Now, the last a-position x in every 
a E L(ip)aL(ip) is uniquely defined by 

£(x) = X(x) = AVy:Ki/^ X(y) ^ a. 

Using relativization techniques, we now define F0 2 -sentences <p <a and "0>a such that L(ip)aL(ip) = 
L(f<a A 3x: £(x) A ip> a )- We give the inductive construction for ip >a - The other one for ip <a is 
symmetric. Atomic formulas are unchanged and Boolean connectives are straightforward. Exis- 
tential quantification is as follows: (3x: C)>a = 3x: (By: y < x A £(y)) A (> a . 

The case a, ^ Ai is similar (using a factorization of a at the first apposition) . □ 

Theorem 5.5 Let L C r°°. 77ie following assertions are equivalent: 

1. L is FO 2 -definable. 

2. L is regular and Synt(L) G DA. 

3. L is strongly recognized by some homomorphism h : T* — > M E DA. 

4- L is closed in the strict alphabetic topology and L is weakly recognized by some homomorphism 
h:T* M E DA. 

5. L is a finite union of sets of the form A\a\ ■ ■ ■ AlakA k X! +1 PI j4™ 1; where each language 
A\ax ■ ■ ■ A k > akA k K ^_ 1 is an unambiguous monomial. 
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Proof: "1 2": First-order definable languages are regular; Synt(L) G DA by Lemma 5.1. 
"2 3": Trivial, since Synt(L) strongly recognizes L. "3 => 4": Strong recognition implies weak 
recognition; closure in the strict alphabetic topology follows by Lemma 5.2. "4 =>■ 5": Lemma 5.3. 
"5 1": Lemma 5.4. □ 

Restricted to languages in T* the fragment FO 2 is equal to A2, hence it is equal to a fragment 
of £2. I n general we have the following upper bound for languages over finite and infinite words. 

Corollary 5.6 For languages in r°° the fragment FO 2 is contained in the Boolean closure o/£2- 

Proof: Every F0 2 -definable language is a Boolean combination of unambiguous monomials and 
alphabetic restrictions of the form A lm . By Theorem 4.2, monomials are £2-definable and in the 
proof of Lemma 5.4 we have seen that A lm is definable in the Boolean closure of the fragment £2. 

□ 

Recall that if a language L C r°° is weakly recognizable by some finite monoid, then it is also 
strongly recognizable by a finite monoid. The same holds for aperiodic monoids: if L is weakly 
recognizable by some finite aperiodic monoid, then there is a finite aperiodic monoid which strongly 
recognizes L. Theorem 5.5 suggests that this fails for DA. Indeed, we have the following example. 

Example 5.7 Let T = {a, b, c}. Consider the congruence of finite index such that each class [u] is 
defined by the set of words v where u and v agree on all suffixes of length at most 2. The quotient 
monoid of T* by this congruence is in DA. In fact, it is a very simple monoid within DA since it 
is £-trivial (where C is one of Green's relations, see e.g. [16]). Let L — [ab] u — (T*ab) u . Then, by 
definition, L is weakly recognizable in DA; and L is the language of all a which contain infinitely 
many factors of the form ab. This language is however not open in the strict alphabetic topology 
since (cab)^ G {T*ab) u , but (cab) m (acb) u <£ (T*ab) u for all m > 0. O 

5.2 Unambiguous polynomials and the fragment FO 2 R S 2 

In this section, we show that the intersection of FO 2 and £2 has very natural descriptions involving 
topological notions or unambiguous polynomials. 

Theorem 5.8 Let L C r°°. The following assertions are equivalent: 

1. L is both FO 2 -definable and £2- definable. 

2. L is FO 2 -definable and open in the alphabetic topology. 

3. L is an unambiguous polynomial, i.e., L is a finite union of unambiguous monomials of the 
form A\ax---Ala k Af +v 

4- L is the interior in the alphabetic topology of some FO 2 -definable language. 

Proof: "1 =$> 2": Theorem 4.2. 

"2 ^3": Let a G L G FO 2 n £ 2 . By Theorem 5.5 we choose an unambiguous monomial 
P = A\a\ ■ ■ ■ A^ak (from a given finite set depending on L) and ACT such that PA°° n A lm is 
unambiguous and a <E PA°° n A im C L. W.l.o.g. A^$. Let A = {bi, . . . , b m } and B,- = A \ {b,} 
and R — B*b\ ■ ■ ■ B* n b m . Let L be strongly recognized by h : T* — > M. To every sequence v\ - ■ ■ v n 
with Vi G r* we can assign a complete graph with vertices {0, . . . ,n} where the edge with 
i < j is colored by the monoid element /i(i>;+i • • ■ Vj) G M. By Ramsey's Theorem [20] there 
exists r G N such that for every sequence v\ ■ ■ ■ v r with Vi G T* there are 1 < j < I < r with 
h(vj ■ ■ ■ vi) = e = e 2 in M . 

Trivially, we have a G PR r A°°. The monomial Pi? r ^4 00 is unambiguous and for some fixed 
language L we consider only finitely many of them. We claim that PR r A°° C L. Let (3 G PR r A°° 
and write — uv\ ■ ■ ■ v r 7 with u G P, Vi G R, and 7 G A°° . Choose vj ■ ■ ■ vg = v such that h(v) 
is idempotent. Then uv\ ■ ■ -vgv^ G PA°° n A lm C L. Since L is open and alph(w) = A we have 
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uv\ ■ ■ ■ viv s A° c C L for some s £ N, By strong recognition and by idempotency of h(v) we see 
that j3 G uvi ■ --viA 00 C L. Therefore, PR r A°° C L. 

"3 =>• 1": Theorem 4.2 and Theorem 5.5. 

"3 4" : Trivial. 

"4 =>■ 2": It suffices to show that the interior of an F0 2 -definable language is again FO 2 - 
definable. Since FO 2 is closed under complement, this is equivalent to saying that the closure K of 
an F0 2 -definable language K is F0 2 -definable. By Theorem 5.5 we may assume that K = PnB lm 
where P = A*a\ ■ ■ ■ A k akA k x ^ 1 is an unambiguous monomial and B — Ak+i- By Lemma 3.8 we 
obtain 

K= |J Atcn-'-A^ai-iA^nA™. 

{a,i,...,a k }UBCA<ZAi 

By Theorem 5.5 we see that K is F0 2 -definable. □ 
5.3 The fragment FO 2 n U 2 

Next, wc discuss properties of closed unambiguous polynomials and closed unambiguous monomi- 
als. 

Theorem 5.9 Let L C r°° be a regular language. The following assertions are equivalent: 

1. L is both FO 2 -definable and H2- definable. 

2. L is FO 2 -definable and closed in the alphabetic topology. 

3. L is the closure in the alphabetic topology of some FO 2 -definable language. 

Proof: The equivalence is the dual statement of the equivalence of "1", "2", and "4" in Theo- 
rem 5.8. □ 

Theorem 5.9 is not fully satisfactory since we do not have any direct characterization in terms 
of polynomials. We might imagine that if L is closed (and L £ FO 2 n II2), then it is a finite union 
of languages K n B im where each K n B [m is closed. But this is not true: Let L = T*a U T w , 
then L is closed and in FO 2 n H2, but cannot be written in this form because L — T*a is not 
closed. We also note that the closure of a language L e FO 2 (~l S2 is not necessarily in A 2 . A 
counter-example is the language L — T*abc. By Lemma 3.8, the closure of L is L = L UT™ which 
is not ^-definable. 

We have however a characterization when certain unambiguous monomials are closed: 

Proposition 5.10 Let A*ai ■ ■ ■ A^akA 00 be unambiguous with Ai C {a,, . . . , a^} for all 1 < i < k 
and let P = A\a\ ■ ■ ■ A* k akA°° n B lm for some B C A. The following assertions are equivalent: 

1. There is no 1 < i < k such that B C {a^, . . . , ak} Q Ai. 

2. The unambiguous monomial P = A^ai ■ ■ ■ A* k akA°° ClB lm is closed in the alphabetic topology. 

Proof: "1 => 2": Assume by contradiction that P is not closed. Let a ^ P with im(a) = C 
such that a is in the closure of P. Then, by Lemma 3.8, there is some 1 < i < k such that 
{a^, . . . , ak} UBCCC A4. Thus, {a^, . . . , a^} — C — Ai since by hypotheses Ai C {a,, . . . , ak}- 
Since a is in the closure of P we have B C C = {ai, . . .,ak} = A^. This is a contradiction to "1" . 

"2 =>■ 1": Assume by contradiction that B C {ai, . . . ,ak} Q Ai for some 1 < i < k. We 
have a 1 ---a l _ 1 (a l ---a k ) m B oc n S im - C P for all m > 1 because S C A. As P is closed 
and _B C {a^, . . . , a^} we see ai • • ■ a,_i(aj • ■ • afe)" G P and hence {ai, . . . , ak} Q A. But this 
is a contradiction to the fact that P is unambiguous since {ai, . ..,ak} C fl 4 implies that 
ai • • • a.i_i(a.i • • • a^) 2 has two different factorizations. □ 
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5.4 The relation between FO 2 and S 2 n n 2 

For finite words we have the well-known theorem that F0 2 -definability is equivalent to A2- 
definability. However, this does not transfer to infinite words, where A2 forms a proper sub- 
class of FO 2 . Consider L = {a, 6} lm , then L is neither open nor closed, in general. Hence 
L G FO 2 \ (£2 U1I2). The result for finite words is therefore somewhat misleading. The correct 
translation for the general case is given in the following theorem, which covers the situation for 
finite words by choosing A = 0. 

Theorem 5.11 For all A CT the following assertions are equivalent: 

1. LC\A im is FO 2 -definable. 

2. There are languages L a 6 FO 2 n £2 ond G FO 2 D n2 such that 

l n A im = L a n A im = n ,4 im . 

3. There are languages L a G £2 and L n G n2 such that 

L n A im = L a C\ A im = n A im . 

Proof: "1 => 2" : By Theorem 5.5 we see that L n A lm is a finite union of unambiguous monomials 
A\ai ■ ■ -A^akA 00 n ,4 im . We let L a be the finite union of the monomials A\ai ■ ■ ■ A* k a k A°°; by 
Theorem 5.8 we obtain L a G FO 2 n E 2 . Let if be the complement of L n A im . Then if and 
if n A im are F0 2 -definable. Thus, K D A im = K a n ^ im for some if CT G FO 2 H £ 2 - Let be the 
complement of K„. Then L w G FO 2 n n 2 and X n A im =L T n A im . "2 =4> 3" : Trivial. "3 1" : If 
L = L a nA lm , then a slight modification of the proof for Lemma 4.1 shows that all idempotents in 
Synt(L) are locally top. Identically, if L = L n n A lm , then all idempotents in Synt(L) are locally 
bottom. Thus Synt(L) G DA, and by Theorem 5.5 we see that L is F0 2 -definable. □ 

6 The fragment A 2 = E 2 n Ii 2 

The first-order fragment A2 is the intersection of £2 and n2. It is the largest subclass of £2 
(and also of n2) which is closed under negation. Since over finite and infinite words we have 
£ w £2 and £* $ H2, we obtain different intersections £2 H n2 depending on whether we 
consider finite words, infinite words, or simultaneously finite and infinite words. In this section, 
we will give characterizations of A2 for infinite words and for finite and infinite words T°°. In 
both settings, it will turn out that A2 is a strict subclass of FO 2 . 

6.1 Clopen unambiguous monomials 

Languages in £2 are open and languages in n2 are closed. Hence, a language in A2 must be clopen 
in the alphabetic topology. The first step towards a convenient characterization of A2 is therefore 
a description of clopen unambiguous monomials. 

Lemma 6.1 Let P = A\a\ ■ ■ ■ A* k a,kA°° be an unambiguous monomial. The following assertions 
are equivalent: 

1. There is no 1 < i < k such that {a^, . . . , ctfc} C A%. 

2. P is closed in the alphabetic topology. 

3. P is clopen in the alphabetic topology. 
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Proof: "1 =>■ 2": By Lemma 3.8 (setting A k +i = A) we see that the closure of P is: 

|J A* 1 a 1 ---A*_ 1 a l „ 1 A™nB hn . 

{a i ,...,a fc }CBCA i 

Since there is no {a^, . . . , a^} C A{ for 1 < i < k, we see that this union is just P itself. Therefore, 
P is closed. "2 => 3": is clear, because P is open. "3 =>■ 1": Assume by contradiction that 
{aj, . . . , a*} C A{ for some 1 < i < k. We have a% ■ ■ ■ aj_i(a, • ■ • a^-)" 1 S P for all m > 1. As P is 
closed we see a± • ■ • a$_i (a; • ■ • aj,)" 6 P and hence {a^, . . . , afc} C A. But this is a contradiction to 
the fact that P is unambiguous since {ai, . . . , cik} Q Ai f] A implies that a\ ■ ■ ■ ai-\(ai ■ ■ ■ a^) 2 G P 
has two different factorizations. □ 



Lemma 6.2 Let L C r°° be a closed polynomial. For every unambiguous monomial 

P = A\ai ■ ■ ■ A* k a k A°° C L 

there exist closed unambiguous monomials Qi, . . . ,Qt such that P C Qi U • • ■ U Qe C L, i.e., there 
exists a finite covering of P with closed unambiguous monomials in L. 

Proof: We start with a normalization procedure in which we begin with making the last appear- 
ances of the letters in A* explicit. We have B* = (B\ {b})* U B*b(B \ {b})* for every b 6 B. 
This yields the substitution rule of replacing A* in P by (A; \ {a})* and also by A*a(Ai \ {a})* 
which gives two new monomials. After iterating this substitution rule a finite number of times, 
we obtain unambiguous monomials of the form P[ = B\b\ ■ ■ ■ B*b s A°° such that P = [jP- and 
Bi C {bi, . . . ,b s } for every 1 < i < s. In the next phase of the normalization procedure we make 
the first appearances of the letters in A°° explicit. We have B°° ~ (B\ {b})°° U {B \ {b})* bB°° 
for every b G B. As above, this yields a substitution rule and after a finite number of applications 
to the P[ we obtain unambiguous monomials of the form P" = B*b\ ■ ■ ■ B*b s B* +1 b s+ i ■ ■ ■ B^btA°° 
such that P — 1J P(' and the following properties hold: 

• Bi C . . . , b t } for every 1 < i < s. 

• {b i7 ...,b t }%Bi for all s + 1 < i < t. 

• A = {b s+1 , . . .,b t }. 

It suffices to prove the lemma for P = B\b\ ■ ■ ■ B*b s B* +1 b s+ i ■ ■ ■ Bfb t A°° with the above proper- 
ties. If P is not closed, then by Lemma 6.1 there exists 1 < i < s such that Bi D {bi, . . . , b t }, and 
hence A C Bi — {bi, . . . ,bt} due to the normalization procedure. We fix the minimal index i with 
this property. 

Next, we use a Ramsey argument. Let L be strongly recognized by h : T* — > M and let r = 
r(M) be the Ramsey number such that every complete edge-colored graph with r nodes and using 
at most \M\ colors contains a monochromatic triangle. We have B* = (Bi\{bj})*U(Bi\{bj})*bjB* 
and Bi \ {bj} is no longer a superset of {bi, . . . , b t }. Therefore, we only have to consider the case 
where we replace the factor bi-iB*bi in P by bi-i(Bi \ {bj})*bjB*bi for some i < j < t. Repeating 
this procedure we are left with a situation where we have replaced bi_\B*bi in P by bi-\R r B*b t 
in P where 

r = (Bi \ {kyyhiBi \ {b l+1 })*b l+1 ■ ■ ■ (Bi \ {b t })*b t . 

Note that the resulting monomial P is unambiguous and that the alphabet of every word in R is 
Bi = {bi, . . . ,b t }. 

Now consider a — uv\---v r G B\b\ ■ ■ ■ B*_ l bi~\R r , with vj G R for all 1 < j < r. By 
the choice of r being the Ramsey number for triangles we find some ji < ji < 33 such that 
h(vj t ■ ■ ■ Vj 2 ) = h(vj 2+ \ ■ ■ ■ Vj 3 ) = h(vj 1 ■ ■ ■ Vj 3 ) is idempotent in the monoid M . Since L is closed 
we see that 

uv 1 ---v J1 -i(v J1 ■■■VjzY G L. 
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Indeed, for each prefix w m = uv± • ■ ■ Vj 1 -i(vj 1 • ■ ■ Vj 2 ) m we have alph(w :)1 ) = {bi, . . . , bt} = Bi and 
w m h ■ ■ -h e P C L. 

Since L is open, there is some m such that w m B°° C L. This follows again because alph^-J = 
. Since h strongly recognizes L and since h(w m ) = h(uvi ■ ■ ■ vj 2 ) by idempotency of h(vj 1 ■ ■ ■ Vj 2 ), 
we have uv\ ■ ■ ■ Vj 2 B°° C L. In particular, toi • • ■ v r B°° C L. 
This is true for all a e £?* &i ■ ■ ■ B*_ 1 b i ^iR r , hence 

B* 1 b 1 ---B*^ 1 b^ 1 R r B°° C L. 

By construction, Q = B*6i ■ • • B*_ l bi^iR r B°° is a closed unambiguous monomial and due to the 
normalization, we have B*b t ■ ■ ■ Blb t A°° C B°° and hence P C Q. □ 

6.2 Arrow languages and deterministic languages 

The results of this section are very similar to results on deterministic and complement-determi- 
nistic languages which can be found in [16], too. Moreover, the conditions in Proposition 6.4 and 
Proposition 6.5 can be complemented by several other equivalent characterizations, see e.g. [16, 
Theorem VI. 3. 7]. One of them is the class of finite Boolean combinations of regular Cantor-open 
languages and another one is in terms of the second level of the Borel hierarchy over the Cantor 
topology. 

We write s 1Z t for monoid elements s,t £ M if there exist x,y £ M such that s — ty and 
t = sx, i.e., if the right-ideals sM and tM are equal. The relation 1Z is one of Green's relations, 
see e.g. [17]. 

Lemma 6.3 Let L C r°° be a deterministic language which is strongly recognized by some sur- 
jective homomorphism h : T* — > M onto a finite monoid M. Let s,e,t, f,x,y,€ M such that 
(s, e), (t, f) are linked pairs and s = ty and t — sx (thus, slZt). Assume that 

iefninP ^ 0. 

Then we have [t][j/ex/] w C L. 

Proof: Let So, eo, /o, xq, yo G T* be words which are mapped to the corresponding elements in 
s, e, f,x,y£ M. We choose eo ^ 1 nonempty, which we can do due to the assumption. Since L is 
deterministic, there exists a set W C T* such that L D T w = W fl T w . We are going to construct 
sequences of words s n € [s][[xf][ye]) and w n £ W for n £ N such that 

so < wo < s i < w i < s 2 < W2 < ■ ■ ■ 

where < denotes the strict prefix order on words. Thus, the limit defines an infinite word a 
such that a £ [s]([xf][ye]) fl W. In particular, a £ L. Moreover, since sxf = t we have 
a £ [t][yexf] iA ' fl L and hence [i][yea;/] w C L due to strong recognition. 

Thus, it is enough to define the sequences s n and w n for n £ N as above. The condition 
so £ [s] ([xf] [ye]) is satisfied by definition. Let neff. Inductively, we may assume that u>k and 
s m are defined as desired for k < n and m < n. We are going to define w n and s n +i. Infinitely 
many prefixes of s n xofoyoeQ are in W, because s n XofoyoeQ £ [s][e] u C L. Thus we find w n £ W 
and £ > 1 such that 

s„ < w n < s n+ i = s n x foyoeQ. 
By induction we see that s„+i S [ s ]([ ;E /][2/ e ]) n+1 because xo/oj/oeo G [^/][ye] since e 2 = e. □ 

Proposition 6.4 Let L C r°° &e strongly recognized by some surjective homomorphism h : T* — > 
M onto a finite monoid M . Define 

some linked pair (s, e)} . 

T/ien t/ie following four assertions are equivalent: 



17 



1. L = W. 

2. For all linked pairs (s, e), (t, f) with s IZt we have 

['WQL o iwrcL. 

3. For every linked pair (s, e) we have 

[s][e] u CL « [s] C L. 

4-- Both L and its complement are arrow languages. 

Proof: "1 => 2" : Let [s] C VF and let (t, /) be a linked pair with s 1Z t. It is enough to show 
[t] [f] w Q L. If s = t, then [i] [f] w CW — L. For s ^ t we find z ^ 1 ^ y with s = ty and f = sx. It 
follows that [s][a;y] w R L n ^ 0. Lemma 6.3 yields [t][yeaf/] w C L for e = xy. But then [t] C W 
and [*][/]<" C W = L. 

"2 3": If [s][e] w C L then by "2" we have [s][l] u C L. Since [s] C [s][l] w , it follows [s] C L. 
Conversely, if [s] C L, then strong recognition yields [s][l] w C L; and hence [s][e] w C L by "2". 

"3 =4> 4": The condition is symmetric in L and its complement. Therefore it is enough to show 
that L is an arrow language. We show L = LOT*. Let [s][e] w C L. Then, by "3", we see that 
[s] C L and hence [s][e] u C [s] C L n F*. For the other inclusion, let a G L n T* . Then a G [s] for 
some s € M with [s] fl I ^ 0. We can find a linked pair (s, e) such that a S [s][e] w . By strong 
recognition, [s] C [s][l] £ " C L. By "3" we conclude [s][e] w C L and o£L. 

"4 =>■ 1" : Since L is an arrow language, it is enough to show L D T* = W . The inclusion 
L n T* C W is trivial. For the converse assume by contradiction [s] PI L = 0, but [s][e] w C L for 
some linked pair (s, e). Then [s] C T* \ L. Since the complement of L is an arrow language, we 

have [s][e] w C [s| C T* \ L = T 00 \ L, which is a contradiction to [s][e] w C L. Thus, WCLnl*. 

□ 

The following result yields a simple proof for a Landweber type result in the special case of 
deterministic and complement-deterministic languages. 

Proposition 6.5 Let L C T" be a deterministic language which is strongly recognized by some 
surjective homomorphism h : T* — > M onto a finite monoid M . Let 

W = \J{[s]CT* | [s][e] w CLfor some linked pair (s, e)} 

and U = T* \ W. Then W U U = r°° andWDU = 0, i.e., T°° is a disjoint union of WandU. 
Moreover, W fl T u — L if and only if L is complement-deterministic, too. 

Proof: Clearly, W U U — r°° . Assume by contradiction that there is some a G W fl {/ . Then 
aef with a G [s] n [t] such that [s] C W and [i] C t/. Using the usual application of Ramsey's 
Theorem at those prefixes belonging to [s] or [t], respectively, we see that for some linked pairs 
(s,e), (t, f) we have a G [s][e] w and a G [i][/] ClJ . We have s = ty and t — sx with i / 1 / i; 
because s ^ t &s U C\W = %. Since [s][e] w n L n ^ 0, by Lemma 6.3 we have [t}[yexf] u C L. 
This contradicts [t] C U = T* \ W. 

For the second statement of the proposition: If W fl = L, then by the first statement of 
this proposition we have T u \ L = U fl F w , i.e., i is complement-deterministic. 

For the converse, let L be complement-deterministic. Clearly, L C fl F w . Assume by 
contradiction that there is some a G W \ L for some q 6 P. Then a G [s] Pi for [s] C VF 

and (t, /) is a linked pair with [t] C T U \L. We have s = ty and t = sx for some x,y £ M. By 
definition of VF, we find a linked pair (s, e) such that [s][e] w C L. We have [s][ef HinP ^ and 
MLf] w n(r°°\L)nr" ^ 0. Since both L and r°°\L are deterministic, we can apply Lemma 6.3 and 
obtain [t][yea;/] w C L and [s][x/ye] w C I" \ L. This is a contradiction to strong recognizability, 
since [tWyexff n [s][a;/ye] w ^0. □ 
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6.3 Various characterizations of A2 

We are now ready to characterize A2-definable languages L C r°° over finite and infinite words. 
Theorem 6.6 Let L C r°° be a regular language. The following assertions are equivalent. 

1. L is A2- definable. 

2. L is FO 2 -definable and L is clopen in the alphabetic topology. 

3. L is a finite union of unambiguous closed monomials A\a\ ■ ■ ■ A^a^A 00 , i.e., there is no 
1 < i < k such that {a^, . . . , ak} Q Ai. 

4- Synt(L) G DA and for all linked pairs (s,e), (t, f) with slZt (i.e., there exist x,y G Synt(i) 
such that s = ty and t = sx) we have 

[s][e] u C L [t][f] u C L. 

5. L is weakly recognized by h : T* — ► M for some M G DA, and for all linked pairs (s,e), 
(i, /) with slZt in M we have [s] [e] u CL if and only if [t] [/]" C L. 

6. Synt(L) G DA and both L and its complement T°° \ L are arrow languages. 

Proof: "1 =>• 2": By Theorem 4.2 and its dual version for II2, we see that Synt(L) G DA and that 
L is clopen in the alphabetic topology. From Theorem 5.5 it follows that L is F0 2 -definable. 

"2 =>■ 3": By Theorem 5.8, L is a finite union of unambiguous monomials. Property "3" now 
follows by Lemma 6.2 and Lemma 6.1. 

"3 1": Theorem 5.8 and Theorem 5.9. 

"2 4": By Theorem 5.5, we see that Synt(i) G DA. Suppose [s][e] u C L and let s = ty 
and t = sx. Since L is closed we see that [s][ex/y] w C L and by strong recognition we conclude 
[t][/yex] w C L. Let A = [j {alph(w) | v G [/]}. Since L is open and by strong recognition, there 
exists r G N such that [t] \ f yex] r A°° C L. Moreover, t = tfyex and thus, \t]A°° C L. In particular, 
[t][f] u C L because [/] C A*. 

"4 ^ 5": Trivial with M = Synt(L) and h = h L . 

"5 ^2": If a G [s][e] w n [t][f] u for linked pairs (s, e), (t, /), then sKt. Hence [s][e] w C L and 
[s][e] w n 7^ implies C L. In particular, h strongly recognizes L. 

Definability in FO 2 follows by Theorem 5.5. By symmetry, it suffices to show that L is open. 
Let a G [s][e] w C L for some linked pair (s, e) and write a = uf3 with u G [s] and (3 G [e]"nA°°nA lm 
for some ACT. Let v < (3 be a prefix such that v G [e] and alph(w) = alph(/3). We want to show 
uvA°° C L. Consider uwy G T 00 where 7 G A°°. We have MW7 G [£][/]" for some linked pair (t, f). 
Let v' < 7 such that mw' G [t]. Since Synt(L) G DA we have vv'v G [e] and s = t ■ h(v). Together 
with t = s - h(v') it follows s K t and by "4" we obtain W7 G C L. 

"4 <^ 6" : This equivalence follows from Proposition 6.4. □ 

Corollary 6.7 Let L C r°° be a regular language such that Synt(L) G DA. The following 
assertions are equivalent: 

1. L is clopen in the alphabetic topology. 

2. Both L and its complement T°° \ L are arrow languages. 

Proof: The statement follows from the equivalence of "2" and "6" in Theorem 6.6 since by Theo- 
rem 5.5 the language L is F0 2 -definable if and only if Synt(L) G DA. □ 

Remark 6.8 The statement of Corollary 6.7 does not need to hold outside the variety DA. For 
example the aperiodic language L = (ab) 1 ^ U (ab)*a C {a, b} 00 is an arrow language and its com- 
plement is also an arrow language, but it is not open. 
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6.4 The intersection of S 2 and U 2 over infinite words 



The next corollary gives a characterization of the fragment A2 for w-languages, i.e., we consider 
the intersection of £2 and II2 over infinite words (instead of finite and infinite words) . Note that 
the language C r°° of all infinite words is 112-definable, but not £2-definable as a subset of 

poo 

Corollary 6.9 Let LCT" be an to-regular language. The following assertions are equivalent: 

1. L G 1I2 and there exists a language L a G £2 such that L = L a n . 

2. There exist languages L a G £2 o,nd G II2 such that L — L a HP =L T nr u . 

3. Synt(L) G DA and L is deterministic and complement- deterministic. 
4-. There exists a language L$ G A2 such that L = Ls R F w . 

Proof: "1 ^ 2": Trivial, since L^L^nP is n 2 -definable. 

"2 => 3": By Theorem 5.11 we see that L is F0 2 -definable and by Theorem 5.5 we conclude 
Synt(L) G DA. The complement of is £2-dcfinable, hence is closed by Theorem 4.2. There- 
fore, L — L n C\T u is closed too. By Corollary 3.3 it follows that L is deterministic. Symmetrically 
we deduce that \ L is also deterministic. 

"3 => 4": Let W = [j {[a] C V* \ [s][e] u C L for some linked pair (s, e)} and set L s = W. By 
Proposition 6.5 we have L = Ls P\T U . Moreover, both Lg and its complement are arrow languages. 
Since Synt(L,5) = Synt(L) we can apply Theorem G.6 and conclude Ls G A2. 

"4 ^> 2": Trivial with L a = L T = L s . □ 



6.5 On the construction of examples 

In this section, we relate different classes of linked pairs with the languages recognized by these 
classes. The different classes come from several acceptance conditions of homomorphisms onto 
finite monoids such as weak or strong recognition. For monoids in DA, the results in the previous 
sections allow us to deduce non-trivial properties of the languages recognized by the respective 
classes of linked pairs. 

Let h : T* — > M be a surjective homomorphism onto a finite monoid M. By definition of 
weak recognition, for every linked pair (s,e) the language [sjfe]" is weakly recognized by h and 
every language which is weakly recognized by h is a union of such languages. We say that two 
linked pairs (s,e), (t, f) are conjugated, if e = xy, f — yx, and t = sx for some x,y G M. It 
is easy to verify that conjugacy forms an equivalence relation on the set of linked pairs and that 
[s][e] w n 7^ if and only if the linked pairs (s,e) and (t, f) are conjugated. We define for a 

linked pair (s, e) the class [s, e] as a language by: 

[s,e] = |J{M[/r I (s,e) and (t, f) are conjugated} C r°°. 

The language [s, e] is strongly recognized by h; and every language, which is strongly recognized 
by h, is a union of such languages. 

The set [s] is an arrow language which is weakly recognized by h since 

[s] = U {[«][/] " I (s, /) is a linked pair for some / G M} 

If an arrow language L C r°° is weakly recognized by h, then L is a union of languages of the form 
[s] since L = L n T* and L fl F* = |J {[s] \ [s]nl^f)}. In general, [s] is not strongly recognized 
by M. 

For every s G M we denote by 1Z S the 7?.-class of s, i.e., 1Z S = {t G M | sM = tM}. We have 
[1Z S ] = I) {[s, e] I there exists e G M such that (s, e) is a linked pair} . 
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By Proposition 6.4, both [TZ S ] and its complement are arrow languages which are strongly recog- 
nized by h. Conversely, if L and its complement are arrow languages which are strongly recognized 
by h, then L is a union of languages of the form [JZ S ]. Moreover, as shown in the proof of Theo- 
rem 6.6, if L and its complement are arrow languages and if L is weakly recognized by h, then, in 
fact, L is strongly recognized by h. 

Therefore, for some given h : T* — > M we find examples as follows: 

• [ s ][ e ]" are languages which are weakly recognizable by h. 

• [s, e] are languages which are strongly recognizable by h. 

• [s] are arrow languages which are weakly recognizable by h. 

• [1Z S ] are arrow languages whose complement is also an arrow language, and which are strongly 
recognizable by h. 

More concretely: If M G DA then, by Theorem 5.5, the languages which are strongly recog- 
nizable by h are F0 2 -dcfinable, but by Example 5.7 weak recognition is not enough to guarantee 
F0 2 -definability. By Theorem 6.6, languages L are A2-definablc if they are strongly recognizable 
by h and if both, L and r°° \ L are arrow languages. 

Therefore we can produce examples along the following line: We start with some linked pair 
(s, e), this yields [s][e] w which is weakly recognizable and [s, e] which is strongly recognized by h. 

The arrow language [s] is incomparable with [s,e], in general. By definition, [s] is a deterministic 

language. Moving to [TZ S ] yields an arrow language, where its complement is an arrow language, 
too. We have: 



[K s 

[s,e] 



Example 6.10 Let T = {a,b,c} and P = c*aT*bT*c. The syntactic monoid of P is in DA, 
because P is F0 2 -definable. We can write Synt(P) = {1, a, b, ab, c, ac, be, abc} where the elements 
correspond to minimal length representatives of the classes induced by the syntactic congruence. 
To see this, observe that P = c*aT*bT* n T*c. The syntactic monoid of c*aT*&r* has just the four 
elements in {1, a, 6, ab}. For Synt(P) we copy these classes and add the information whether it rep- 
resents a word ending in c. All elements of Synt(P) are idempotent and its egg-box representation 
(see e.g. [17]) is depicted in Figure 2(c). 

We have P = [abc]. The language L = = [a6c] w is weakly recognizable by Synt(P), too. 
All words in a £ L have infinitely many occurrences of the factor ca and im(a) = T. In particular, 
L is not open in the strict alphabetic topology. By Lemma 5.2, the language L is not strongly 
recognizable by any monoid in DA. 

The conjugacy class of the linked pair (abc, abc) is {(ab, b), (ab, ab), (abc, be), (abc, abc)} and 
[abc, abc] = c*ar*6r°° n (r*6) w . The language [abc, abc] is strongly recognizable by Synt(P) G DA. 
By Theorem 5.5 it is F0 2 -definable. The set [abc, abc] is not open in the alphabetic topology. By 
Theorem 6.6, [abc, abc] is not A2-definable. 

The set [abc] = P = c*aT*br°° n (r*c) w is an arrow language which is weakly recognizable 
by h. It is not strongly recognized by the syntactic homomorphism of P since [a6c][a6c]" C 

1 1 * . 2 

[abc] n [abc, abc] but [abc, abc] % [abc]. On the other hand, [abc] is FO -definable, and therefore, 
by Theorem 5.5, it is strongly recognizable by some other homomorphism onto a monoid in DA. 

The Tvl-class of abc is lZ a bc — {ab,abc}. Hence [R. a bc] = c*aT*br°°. By Proposition 6.4 the 

> > 

complement of [lZ a bc] is also an arrow language; and by Theorem 6.6 the language ["fcabc] is A2- 
> 

definable. Indeed, for [1Z a bc] it is enough to say that there is some b and there is some a with no 
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(a) Monoid M in Example 2.2 
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(c) Synt(P) in Example 6.10 



Figure 2: Egg-box representations 



b to its left. This is a X^-sentence. The equivalent Il2-sentence says that there is some b and for 
all b there exists some a to its left. It is also deterministic and complement-deterministic. O 



7 Summary 

We have given language-theoretic, algebraic and topological characterizations for several first- 
order fragments over infinite words. Since FO 2 and A2 have the same expressive power only when 
restricted to some fixed set of letters occurring infinitely often (Thm. 5.11), the picture becomes 
more complex than in the case of finite words. Among other results, we have shown the relations in 
Figure 3 between the fragments FO 2 , S 2 , n 2 , and A 2 = £2 nn 2 (for completeness we included the 
fragments Si, IT, their Boolean closure B£i, and the Boolean closure BS2 of £2 in the picture). 
The intersection Ai = £1 D III contains only the trivial languages and T°°. The language Lg 
in Figure 3 is the closure of L4 within the alphabetic topology. The interior of L4 (as well as of 
any other language in T u ) with respect to the alphabetic topology is empty. Another example in 
£ 2 n FO 2 which is not in n 2 is the set of all finite words T*. Symmetrically, the language of all 
infinite words is in n 2 n FO 2 but not 

In order to sketch the main results on small first-order fragments over finite and infinite words 
in Table 1, we introduce some terminology. By Pol we denote the language class of polynomials, 
UPol are unambiguous polynomials, and restricted UPol is a proper subclass of UPol. Simple 
polynomials are finite unions of languages of the form T*a\ ■ ■ ■r*a„r°°. A language L C r°° is 
piecewise testable if there exists some k 6 N such that for every a € T°° membership in L only 
depends on the set of scattered subwords of a of length less than k. The first-order fragment £1 
consists of first-order sentences in prenex normal without universal quantifiers; its negations are 
in III and its Boolean closure is BEj.. 

All of the algebraic properties in Table 1 are decidable, since the syntactic monoid of a regular 
language is effectively computable [16, 27]. Together with the PSPACE-completeness of the prob- 
lem whether a language is closed in the alphabetic topology (Thm. 3.5), this yields decidability of 
the membership problem for the respective first-order fragments as a corollary. Decidability was 
shown before by Wilke [31] for FO 2 and by Bojahczyk [2] for £2. The characterization for the 
fragment £1 is due to Pin [18]; see also [16]. The same holds for the Boolean closure of £1 except 
for the topological part of the "Algebra + Topology" characterization, which is a consequence of 
Corollary 6.7. 



8 Outlook and open problems 

By definition, £i-dcfinable languages are open in the Cantor topology. We introduced an al- 
phabetic topology such that £ 2 -definable languages are open in this topology. Therefore, an 
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Here T = 


{a, b, c} and 


Li = 


"there exists a factor aV = T*abT°°, 


Li = 


"finitely many a's" = T* {b, c}°°, 


L 3 = 


"there is a factor ab but no factor ba" = L\ fl Lio 


U = 


"finitely many a's and infinitely many &'s" = L2 H Lg, 


L 5 = 


"the first a occurs before the first &" = c*ar*&r°°, 


L 6 = 


"some a occurs before some b but no a occurs before any C 




r*ar*&r°° n {&,c}*a{a,6}°° = L 7 nL 8 , 


L 7 = 


"some a occurs before some b" = r*ar*6r°°, 


L 8 = 


"no a occurs before any c" = {6, c}* a {a, 6}°° U {6, c}°°, 


L 9 = 


"infinitely many fc's" = (T*b) w , 


Lio = 


"there is no factor ba" = T°° \ T*baT°°. 



Figure 3: Some examples and the relations between the different fragments 

interesting question is whether it is possible to extend this topological approach to higher levels 
of the first-order alternation hierarchy. To date, even over finite words no decidable characteriza- 
tion of the Boolean closure of £2 is known. In case that a decidable criterion is found, it might 
lead to a decidable criterion for infinite words simply by adding a condition of the form "L and 
its complement are in the second level of the Borel hierarchy of the alphabetic topology" . An- 
other possible way to generalize our approach might be combinations of algebraic and topological 
characterizations for fragments with successor predicate sue such as F0 2 [<,suc] or £2[<,suc]. A 
characterization of those languages which are weakly recognizable by monoids in DA is also open. 
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